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ABSTRACT 

We describe the automated spectral classification, rcdshift determination, and parameter measure- 
ment pipeline in use for the Baryon Oscillation Spectroscopic Survey (BOSS) of the Sloan Digital Sky 
Survey III (SDSS-III) as of the survey's Ninth Data Release (DR9), encompassing 831,000 moderate- 
resolution optical spectra. We give a review of the algorithms employed, and describe the changes to 
the pipeline that have been implemented for BOSS relative to previous SDSS-I/II versions, including 
new sets of stellar, galaxy, and quasar redshift templates. For the color-selected "CMASS" sample of 
massive galaxies at redshift 0.4 < z < 0.8 targeted by BOSS for the purposes of large-scale cosmologi- 
cal measurements, the pipeline achieves an automated classification success rate of 98.7% and confirms 
95.4% of unique CMASS targets as galaxies (with the balance being mostly M stars). Based on visual 
inspections of a subset of BOSS galaxies, we find that approximately 0.2% of confidently reported 
CMASS sample classifications and redshifts are incorrect, and about 0.4% of all CMASS spectra are 
objects unclassified by the current algorithm which are potentially recoverable. The BOSS pipeline 
confirms that ~51.5% of the quasar targets have quasar spectra, with the balance mainly consisting of 
stars and low signal-to-noise spectra. Statistical (as opposed to systematic) redshift errors propagated 
from photon noise are typically a few tens of kms -1 for both galaxies and quasars, with a significant 
tail to a few hundreds of kms -1 for quasars. We test the accuracy of these statistical redshift error 
estimates using repeat observations, finding them underestimated by a factor of 1.19 to 1.34 for galax- 
ies, and by a factor of 2 for quasars. We assess the impact of sky-subtraction quality, signal-to-noise 
ratio, and other factors on galaxy redshift success. Finally, we document known issues with the BOSS 
DR9 spectroscopic data set, and describe directions of ongoing development. 

Subject headings: methods: data analysis — techniques: spectroscopic — surveys 
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1. INTRODUCTION 

The Sloan Digita l Sky Survey III (SDSS-III, 
lEisenstein et al.l 1201 If ) is the third phase of the SDSS 
(|York et all 12000 ). 21 Within the SDSS-I II, the Baryon 
Oscil lation Spectroscopic Survey (BOSS, iDawson et al.l 
I2012T ) is currently mapping a larger volume of the uni- 
verse than any previous spectroscopi c survey. The Ni nth 
Data Release of the SDSS-III (DR9, lAhn et al.l[20ll re- 
leased publicly on 2012 July 31) is the first SDSS-III data 
release to include BOSS spectroscopic data, and com- 
prises good observations of 831 unique plate-pluggings 
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of 813 unique tilings (plates worth of targets) on the sky. 
Each plate delivers simultaneous spectroscopic observa- 
tions of 1000 lines of sight with optical fibers that feed a 
pair of two-arm spectrographs, giving a total of 831,000 
BOSS DR9 spectra. 

The main science goal of BOSS is to trace the large- 
scale mass structure of the universe using massive galax- 
ies and quasar Lya absorption systems, in order to mea- 
sure the length scale of the "baryon acoustic oscillation" 
feature in the spatial correlation function of these ob- 
jects (e.g.. lEisenstein et al.l 120051 ). and thereby to con- 
strain the nature of the dark energy that drives the ac- 
celerated expansion of the present-day universe. To meet 
this goal, the BOSS project has specified a series of scien- 
tific requirements, including: (1) an RMS galaxy redshift 
precision better than 300 kms^ 1 ; (2) a galaxy redshift 
success rate of at least 94%, including both targeting in- 
efficiency and spectroscopic redshift failure; (3) a catas- 
trophic galaxy redshift error rate of less than 1%; and 
(4) spectroscopic confirmation of at least 15 quasars at 
2.2 < z < 3.5 per degree 2 from among no more than 
40 targets per degree 2 . To satisfy these requirements 
within such a large survey, automated spectroscopic cal- 
ibration, extraction, classification, and redshift measure- 
ment methods are essential. 

This paper, one of a series of technical papers de- 
scribing SDSS-III DR9 in general and the BOSS data 
set in particular, presents the automated classification 
and redshift measurement software for the main galaxy 
and quasar target samples implemented for the BOSS 
project. This software is written in the IDL lan- 
guage, and is titled idlspec2d. Earlier versions of 
this code were us ed to analyze SDSS-I/II data (see 
lAihara et al.l 120 111) , alongside the complementary and 
ind ependently developed pip eline software spectrold 
fseelSubbaRao et al.ll2002l and lAdelman-McCarthv et al.l 
2006); for the BOSS project, the idlspec2d software has 
been adopted as the primary code, due to its robust error 
estimation methods and its tight integration of redshift 
measurement and classification with the lower-level oper- 
ations of raw data calibration and extraction. The code 
has also been upgraded with new redshift-measurement 
templates and several new algorithms in order to meet 
the scientific requirements of the BOSS project. The 
tagged software version v5_4_45 was used to process all 
BOSS spectroscopic data for DR9 22 , and the classifica- 
tion and redshift results delivered by this code have been 
used for recen tly published BOSS DR9-sample cosmolog- 
ical analyses (lAnderson et al. 120121: iManera et al.l 12012 1: 



Nuza et all 120121: iReid et al 



20121: IRoss et al.l I2012at 



Sanchez et all l2012t iToieiro et al.l I2012D . Anoverview 



of the BOSS project, including experimental design, sci- 
entific goals, obse rvational operations, and ancillary pro- 
grams, is given in lDawson et al.l (|2012[ ). A description of 
the idlspec2d calibration and extraction methods which 
transform raw CCD pixel data i nto one-dimensional ob- 
ject spectra will be presented in iSchlegel et al.l (|2012l ). 

The organization of this paper is as follows. Section [2] 
presents an overview of the spectroscopic data sample of 
BOSS DR9. Section [3] describes the classification and 
redshift pipeline algorithms and procedures, including 

22 The DR9 tagged version of idlspec2d can be obtained at 
www. sdss3 . org/svn/repo/idlspec2d/tags/v5_4_45/. 



the core redshifting algorithm (JEU, special classifica- 
tion handling for the galaxy target samples ( !j3.2|) . mea- 
sured spectroscopic parameters f £|3.3p . and output files 
(i )3.4p . Section [4] describes the templates constructed 
for the automated spectroscopic identification and red- 
shift analysis of BOSS galaxies ( EI4.1j) . quasars ( J4.2|) . and 
stars f ^4.3|) . Section [5] analyzes the completeness, purity, 
accuracy, and precision of the samples classified and mea- 
sured by the idlspec2d pipeline. Section [6] documents 
known issues in the DR9 release of BOSS data, and $7] 
provides a summary and conclusions. 

2. DATA OVERVIEW 

The main BO SS survey program consists of two galaxy 
target samples (jPadmanabhan et al.ll2012T ) and a quasar 
target sample inclu ding both color-selected candidates 
and known quasars (iBoyv et al.|[20TTI : iKirkpatrick et al.l 
1201 H IRoss et all l2012b| ). The galaxy samples are des- 
ignated CMASS (for "constant mass") and LOWZ (for 
"low- redshift" ) . The LOWZ galaxy sample is com- 
posed of massive red galaxies spanning the redshift range 
0-15 < z < 0.4. The CMASS galaxy sample is com- 
posed of massive galaxies spanning the redshift range 
0-4 < z < 0.7. Both samples are color-selected to pro- 
vide near-uniform sampling over the combined volume. 
The faintest galaxies are at r — 19.5 for LOWZ and 
i = 19.9 for CMASS. Colors and magnitudes for the 
galaxy sel ection cuts are corre cted for Galactic extinc- 
tion using ISchlegel et ail (|1998fl dust maps. The BOSS 
quasar sample is selected to recover as many objects as 
possible in the redshift range 2.2 < z < 3.5 for the pur- 
poses of measuring the 3D structure in the Lya forest. 
A variety of selection algorithms are employed to select 
the quasar sample, which lies close to the color locus of 
F stars. The faint-end magnitude limits of the quasar 
target sample are extinction-corrected PSF magnitudes 
of g = 22 and r = 21.85. 

A summary of the DR9 BOSS spectroscopic data 
set (observed between 2009 December and 2011 July) 
is given in Table [TJ along with performance metrics 
that will be discussed in detail further below. Repre- 
sentative BOSS survey spectra are shown in Figure [TJ 
The automated classification and measurement software 
described here is applied to all spectra o btained by 
the BOSS spectrographs (jSmee et al.l 120121 ) . including 
spectra targeted un der ancillary programs described in 
iDawson et al.l (|2012f) . In this work we focus on the anal- 
ysis of the main BOSS galaxy and quasar survey targets, 
since the performance on these samples is the primary 
scientific driver of the design, development, and verifica- 
tion of the pipeline. 

For the purposes of this paper, we define the samples 
of unique LOWZ and CMASS spectra according to the 
following cuts: 

1. Selected by the appropriate sample color cuts (en- 
coded by bit of the B0SS_TARGET1 mask for the 
LOWZ sample, and by bit 1 of that mask for the 
CMASS sample.) The LOWZ and CMASS sam- 
ples are not mutually exclusive, although they are 
mostly non-overlapping. 

2. Observed with a spectroscopic fiber that is well 
plugged, successfully mapped to the target object, 
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Fig. 1. — Mosaic of representative BOSS spectra, with a resolution of R ss 2000. Black lines show data (smoothed over a 5-pixel window), 
cyan lines show best- fit redshift/classification model, and red lines show l-cr noise level estimated by the extraction pipeline. Spectra are 
labeled by PLATE-MJD-FIBERID. Individual objects are: (a) redshift z = 0.256 LOWZ galaxy; (b) redshift z = 0.649 CMASS galaxy; (c) 
redshift z = 0.669 CMASS galaxy with post-starburst continuum; (d) redshift z = 0.217 starburst galaxy (from QSO target sample); (e) 
redshift z = 2.873 quasar; (f) redshift z = 0.661 quasar; (g) spectrophotometric standard star; (h) M star (from CMASS target sample). 



and not affected by bad CCD columns that re- 
move a large fraction of the wavelength coverage. 
These conditions are reported via bits 1 and 7 of 
the ZWARNING bitmask described in AO 

Apparent (not extinction-corrected) «-band mag- 
nitude less than 21.5 within a 2"-diameter cir- 
cular aperture, corresponding to the angular size 
of the BOSS fibers. This criterion excludes low 
surface-brightness targets for which the spectro- 
scopic signal-to- noise ratio (S/N) becomes unac- 
ceptably low for nominal survey exposure times of 
60 minutes in good conditions. 



4. Best single observation within the survey data set, 
for the case of multiply observed spectra. This des- 
ignation is described in 



The sample of unique BOSS quasar spectra for the 
current work is defined according to the following cuts: 

1. Selected from one of the four categories of known 
quasars with redshifts optimal for Lya forest anal- 
ysis (bit 12 of the BDSS_TARGET1 mask ), quasars se- 
lected from the FIRST survey (bit 18. lBecker et al.l 
IT995I) . and candidates from the BOSS "Core" 
and "Bonus" quasar candidate s election algorithm s 
(bits 40 and 41 respectively: see lRoss et a l. 2012b). 
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TABLE 1 

BOSS DR9 SUMMARY SPECTRUM TOTALS 



Item Number 

Plate pluggings 831 

Unique plates 819 

Unique tiles (plates worth of targets) 813 

Spectra 831000 

Effective spectra 1 829073 

Unique spectra 763425 

CMASS sample spectra 353691 

Unique CMASS spectra 324198 

Unique CMASS with ZWARNING.NOQSQ == 2 320031 

Unique CMASS that are galaxies 309307 

LOWZ sample spectra 111347 

Unique LOWZ spectra 103729 

Unique LOWZ with ZWARNINGJJ0QS0 == 2 103610 

Unique LOWZ that are galaxies 102890 

CMASS && LOWZ sample spectra 3201 

Unique CMASS && LOWZ spectra 2990 

Unique CMASS && LOWZ with ZWARNINGJJ0QS0 == 2 2976 

Unique CMASS && LOWZ that are galaxies 2935 

Quasar sample spectra 3 166034 

Unique quasar sample spectra 154433 

Unique quasar sample with ZWARNING == 4 122488 

Unique quasar sample spectra that are quasars 79570 

Number of above with 2.2 < z < 3.5 51903 

Unique quasar sample scanned visually 154173 

Visual 2.2 < z < 3.5 quasars missed by pipeline 895 

Pipeline 2.2 < z < 3.5 QSOs with visual disagreement 5 327 

Sky spectra 78573 

Unique sky-spectrum lines of sight 75850 

Spectrophotometric standard star spectra 16905 

Unique standard star spectra 14915 

Ancillary program spectra 32381 

Unique ancillary target spectra 28968 

Other spectra (commissioning, calibration, etc.) 74620 

Unique other spectra 65461 



"Excludes unplugged fibers and spectra falling on bad CCD 
columns. 

'ZWARNINGJJOQSO == indicates a confident spectroscopi c cla ssi- 
ficat ion and rcdshift measurement for galaxy targets (see H3.1l and 
CHI) . 

c "Quasar targets" tabulated here are from the main survey quasar 
sample, and exclude any ancillary and calibration quasar targets. 

''ZWARNING == indicates a confident spectrosco pic c lassification 
and redshift measurement for quasar targets (see H3.1II , 

e "Visual disagreement" is either |Az| > 0.05 between pipeline 
and visual inspections, or absence of confident visual classification 
& redshift. 

2. Plugged, mapped, and well-covered in wavelength 
(as for Item [2] of the previous list for galaxy tar- 
gets). 

3. Best single observation within the survey data set 
(as for Item 2] of the previous list). 

3. PIPELINE OVERVIEW 

Imaging and spectroscopic data for the BOSS Sur- 
vey are obtained with th e 2.5-m Sloan Te lescope at 
Apache Point Observ atory (IGunn et al. 2006), first with 
the imaging camera (jGunn et al.l 119981) and then with 
an upgraded (relative to SDSS-I/II) spectrograph system 
capable of obtaining 1000 spectra simultaneously using 
optical fibers plugged into a drilled aluminum focal plate 
and feeding two double-arm spectrographs. The charac- 
teristics of this instrument are summarized in Table [2j 



TABLE 2 

BOSS SPECTROGRAPH SYSTEM CHARACTERISTICS 



Parameter 


Value 


On-sky field of view 


3° diameter 


Fiber aperture 


2" diameter 


Multiplex capability 


1,000 objects 


Wavelength coverage 


3,600 A < A < 10,400 A 


Spectral resolution 


A/AA ss 2,000 



and described in detail by iSmee et al.l (|2012f ). The out- 
puts of the fibers feeding each spectrograph are arrayed 
linearly along a "slit-head" and numbered within the 
spectroscopic pipeline by the sequential index FIBERID, 
which by convention runs from 1-500 in spectrograph 1 
and from 501-1000 in spectrograph 2. A unique physical 
target plate is specified by the PLATE identifier. Since 
the same plate can be plugged and observed on multiple 
occasions, with different mappings between fibers and 
target holes, the modified Julian date of a unique plug- 
ging is tracked as well via the MJD parameter. Together, 
the combination of PLATE, MJD, and FIBERID constitute 
a unique identifier for a BOSS spectrum (as was also the 
case for SDSS-I/II spectra). Each plugging is observed 
with multiple exposures which are exactly 15 minutes 
each in length and can be distributed across more than 
one night of observation. Typically four to six exposures 
are required to attain sufficient S/N per pixel at a fidu- 
cial magnitude. All good data from a unique plugging 
are co-added together during the spectroscopic data re- 
duction process. Data from different pluggings are never 
combined together. 

The wavelength calibration, extraction, sky subtrac- 
tion, flux calibration, and co-addition of BOSS spectra 
from r aw CCD pixel data are described in Schlcgc l et al.l 
(|2012f) . The extraction implementation is a varia- 
tion of the o p timal -extrac tion al g orithm described by 
iHewett et all (|1985| ) and iHornd (|1986T >. including a 
forward-modeling solution that de-blends the cross-talk 
between neighboring fi bers on the CCD. (A similar ap- 
proach is described in San din et al.l [20 10.) The outputs 
of this "two-dimensional" pipeline software are stored 
on a plate-by-plate basis for sets of 1000 spectra in the 
multi-extension "spPlate" FITS files, which are the in- 
puts to the "one-dimensional" (ID) pipeline software de- 
scribed in this work. The full contents of the spPlate 
files are described in detail in the online data model 23 ; 
for the purposes of the redshift measurement and classi- 
fication pipeline, the most important products are: 

1. Wavelength-calibrated, sky-subtracted, flux- 
calibrated, and co-added object spectra, rebinned 
onto a uniform baseline of A log 10 A = 10~ 4 (about 
69kms _1 ) per pixel. 

2. Statistical error-estimate vectors for each spectrum 
(expressed as inverse variance) incorporating con- 
tributions from photon noise, CCD read noise, and 
sky-subtraction error. 

3. Mask vectors for each spectrum identifying pix- 
els where warning conditions exist in either any 

23 http://www.sdss3.org/dr9/ 
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(ORMASK) or all (ANDMASK) of the spectra contribut- 
ing to the final co-added spectrum. 

3.1. Redshift measurement and classification 

The BOSS spectral classification and redshift-finding 
analysis is approached as a \ 2 minimization problem. 
Linear fits are made to each observed spectrum using 
multiple templates and combinations of templates evalu- 
ated for all allowed redshifts, and the global minimum-x 2 
solution is adopted for the output classification and red- 
shift. This approach requires the spectra and their errors 
to be well-understood, and requires the template spec- 
tra to sufficiently span the observed space. Both these 
conditions are satisfied for the BOSS galaxy and quasar 
targets, resulting in accurate redshift fits and enabling 
a quantitative assessment of the confidence of those fits. 
By performing a statistically objective analysis, confident 
redshifts are obtained even for data at lower S/N where 
manual inspection may fail. 

The basic outputs of the redshift determination and 
classification algorithm described in this section are 
the measured redshift Z, its associated 1-sigma statis- 
tical error Z_ERR, a classification category CLASS (either 
"GALAXY", "QSD" for quasar, or "STAR"), and a confi- 
dence flag ZWARNING that is zero for confident measure- 
ments and non-zero otherwise. Section I3T21 describes spe- 
cial variations on these outputs that are recommended 
for use with the BOSS LOWZ and CMASS galaxy sam- 
ple spectra. 

The least-squares minimization is performed by com- 
parison of each spectrum to a full range of templates 
spanning galaxies, quasars, and stars. A range of red- 
shifts is explored, with trial redshifts spaced every pixel 
(69kms~ 1 ) for most templates and spaced by four pixels 
(276kms _1 ) for quasar templates. At each redshift the 
spectrum is fit with an error-weighted least-squares lin- 
ear combination of redshifted template "eigenspectra" in 
combination with a low-order polynomial. The polyno- 
mial terms absorb Galactic extinction, intrinsic extinc- 
tion, and residual spectro-photometric calibration errors 
(typically at the 10% level) that are not fully spanned 
by the eigenspectra. The template basis sets are derived 
from rest-frame principal-component analyses (PCA) of 
training samples of galaxies, quasars, and cataclysmic 
variable stars, and from a set of archetype spectra for 
other types of stars. CV stars are handled as a separate 
class from other stars due to their significant range of 
emission-line strengths. The construction of these basis 
sets is described in detail in 2] below. This best- fit com- 
bination model gives a x 2 value for that trial redshift, 
and these values define a X 2 ( z ) curve when computed 
across the redshift range under consideration. To facili- 
tate comparison between template classes with differing 
numbers of basis vectors, these x 2 values are analyzed in 
reduced form x 2 , i- e -> divided by the number of degrees 
of freedom. In practice this is nearly equivalent to work- 
ing in terms of x 2 for any given spectrum, as the number 
of pixels (~ 4500) greatly exceeds the number of free pa- 
rameters in all fits. The best redshifts for a particular 
class under consideration are defined by the locations of 
the lowest minima in the \ 2 curve, where that curve is 
fit by a quadratic function using the five points nearest 
the minimum (11 points for quasars). The initial quasar 
fits where the trial redshifts are spaced every four pixels 



are re-fit near the five best fits with redshifts spaced ev- 
ery pixel. This two-step fitting for the quasars is done for 
computational efficiency, since most of the computational 
time is spent evaluating quasar fits which are performed 
on all targets. The statistical error on the final redshift 
is evaluated at the location of the minimum of the x 2 
curve as the change in redshift ±<5z for which \ 2 increases 
by one above the minimum value. Noise in the spectra 
can result in multiple local minima in the neighborhood 
of the global minimum that are not significant. These 
are typically separated by a few pixels, or ^200 km s -1 . 
For all template fits, we collapse minima separated by 
less than 1000 km s" 1 to a single minimum. Solutions 
with separations exceeding lOOOkms -1 must be explic- 
itly evaluated since they represent catastrophic redshift 
failures for BOSS galaxies if they are statistically indis- 
tinguishable from one another (see panel "h" of Figure [TBI 
further below). The redshift-finding p rocedure is shown 
schem atically in Figure [2J (See also iGlazcbroo k et al.l 
[19981 } 

This core algorithm is applied within the pipeline ac- 
cording to the following sequence: 

1. Read the spectrum, error estimates, and mask 
vectors for a single spectroscopic plate from the 
spPlate file. 

2. Mask pixels outside the range 3600 A-10400 A, 
pixels at wavelengths where the typical sky- 
subtraction model residuals are more than 3-sigma 
worse than the errors expected from a Poisson 
model in any sub-exposure (BADSKYCHI set in the 
ORMA SK vector outp ut by the reduction software; 
iSchlegel et al.l [2012f) . pixels where the sky bright- 
ness is in excess of the extracted object flux plus 
ten times its statistical error in all sub-exposures 
(BRIGHTSKY set in the ANDMASK), and pixels with 
negative flux at greater than 10-cr significance. 

3. Find the best five galaxy redshifts between z = 
—0.01 and z = 1.0, using a rest-frame template 
basis of four eigenspectra ( 34.1[) . 

4. Find the best five quasar redshifts between z = 
0.0033 and z = 7.0, using a rest-frame template 
basis of four eigenspectra (i j4.2|) . 

5. Find the best single redshift for each of 123 sub- 
classes of star from — 1200 kms -1 to + 1200 kms -1 , 
using a single rest-frame archetype spectrum for 
each one ( §4.3j) . 

6. Find the best single cataclysmic variable star red- 
shift from — 1000 km s -1 to + 1000 km s -1 , using a 
rest-frame template basis of three eigenspectra (in 
order to capture emission-line strength variations, 

90) 

7. Sort all redshifts and classifications together by as- 
cending Xr> an d assign the best spectroscopic red- 
shift and classification from among GALAXY, QS0 
(quasar), and STAR (including CV) based on the 
overall minimum x 2 across all classes and redshifts. 
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8. Compare the change in \r between the best clas- 
sification and redshift and the next-best classifica- 
tion and redshift with a velocity difference greater 
than 1000 km/s" 1 , and assign a low-confidence 
"ZWARNING" flag if this difference is either less than 
0.01 in absolute terms or less than 0.01 times the 
overall minimum \ 2 r value. For the ~ 4500 de- 
grees of freedom typical of a BOSS spectrum, the 
absolute threshold of A% 2 = 0.01 corresponds to 
Ax 2 ~ 45 (naively interpreted as 6.7-sigma). The 
relative requirement on A^ 2 , serves to make the 
statistical confidence threshold progressively more 
conservative at higher S/N levels where the red- 
shift templates fits are worse in an absolute sense 
but nevertheless have greater distinguishing power 
among multiple hypotheses. 

The threshold value of A% 2 > 0.01 used to assign con- 
fidence to the classifications is empirically determined. 
The threshold could be lowered further to recover more 
redshifts but at the cost of more mis-classifications and 
incorrect redshifts. Tests on repeat CMASS sample data 
show that decreasing the threshold to 0.008 (0.005) would 
increase redshift completeness by 0.3% (0.6%), with 8% 
(16%) of the added measurements being incorrect. (A 
full analysis of B OSS galaxy redshift completeness and 
purity is given in £15.11 ) An additional test is made pos- 
sible by the nearly 80,000 blank BOSS sky spectra in 
DR9. Among these, 2% fit a template with a confidence 
Ax 2 . > 0.01, implying they would be assigned a confident 
redshift in the absence of prior knowledge of their status 
as blank sky spectra. (Although a small fraction of these 
are in fact real objects detected spectroscopically in the 
sky fibers.) 

As discussed above, the condition ZWARNING = desig- 
nates that the BOSS pipeline has determined a confident 
classification and redshift for a spectrum. The primary 
source of ZWARNING^ spectra is the A% 2 threshold. 
However, several other flags are also encoded bit-wise 
in the ZWARNING mask, as documented in Table El The 
definitions of the ZWARNING mask-bits in BOSS are iden- 
tical to their definitions in SDSS-I/II. Two of the bits 
have been disabled for BOSS DR9, and are only retained 
for historical consistency: (1) the NEGATIVE_MDDEL bit, 
which was previously triggered by stellar model fits with 
negative amplitudes, which are now disallowed entirely; 
and (2) the MAN Y_0UTL I ERS bit, which was found to flag 
too many good, high-S/N quasar redshifts in BOSS. 

Table @] summarizes the number of PCA template 
and polynomial degrees of freedom associated with 
each spectroscopic object class, along with the name 
of the file containing the template basis within the 
idlspec2d/v5_4_45 software product. In most cases, 
the number of PCA templates and number of polyno- 
mial terms used in the redshift and classification anal- 
ysis match the SDSS-I/II idlspec2d values. The one 
exception is the number of CV star templates, which 
has decreased from four in SDSS-I/II to three in BOSS 
due to a smaller available training sample at the time 
the DR9 code version was frozen. For all target classes 
we have verified that the choices are nearly optimal by 
performing tests of the completeness and purity of clas- 
sification and redshift measurement (relative to visually 
inspected subsets) as a function of the size of the PCA 



parabola fit to 
estimate error dz 




Fig. 2. — Schematic illustration of the idlspec2d redshift mea- 
surement algorithm. The reduced x 2 curve as a function of redshift 
(black curve) is determined from the best-fit linear combination 
of template basis spectra at each trial redshift value. The best 
redshift is denned by the location of the global minimum (green). 
Subsidiary minima separated by less than 1000 km s — 1 are not con- 
sidered to be separate (pink). The curvature of a parabolic fit to 
the Xr curve at the global minimum (magenta) is used to deter- 
mine the best-fit redshift error estimate. The second-best redshift 
fit is determined by the location of the second- lowest well-separated 
Xr minimum (blue). The difference Ax 2 (red) between best and 
second-best redshifts is used to assign confidence in the measure- 
ment, as described in the text. 

and polynomial basis. As can be expected, increasing 
the number of PCA and polynomial terms used for the 
modeling of a particular class increases both the com- 
pleteness and the impurity of the resulting sample for 
that class. Increases in impurity arise from both catas- 
trophic mis-classification and catastrophic redshift error, 
with the former decreasing the completeness of other 
classes. Each spectrum in the survey is fitted with all 
classes of objects in order to determine a spectroscopic 
redshift and classification that is independent of photo- 
metric data and targeting information (but see §3.21 be- 
low). 

3.2. Special galaxy target handling 

The implementation of the idlspec2d redshift code 
is designed to meet the BOSS scientific requirements 
on redshift success rates, as discussed in the Introduc- 
tion. The original SDSS-I/II code operated on spectra 
alone, without imposing classification or redshift priors 
based on photometric data or other targeting informa- 
tion. At the S/N typical of SDSS-I/II spectra, this tech- 
nique proved highly successful, resulting in a redshift 
success rate better than 99% for the main galaxy sam- 
ple and a negligible incidence of catastrophic errors. For 
the BOSS galaxy samples, however, some prior informa- 
tion from the targeting photometric catalog is needed to 
achieve the required redshift success rate. Specifically, 
we have found in practice that the LOWZ and CMASS 
targets can be galaxies, stars, or superpositions of the 
two, but are almost never quasars (however, see Item [2] 
in Sj6]) Without using any prior information, the red- 
shift code produces an excess of erroneous quasar clas- 
sifications for CMASS targets due to unphysical quasar 
basis-plus-polynomial combinations yielding the global 
minimum-% (see panel "a" in Figure [TBI further below.) 

To remedy this, the adopted BOSS survey val- 
ues for the redshifts of LOWZ and CMASS galaxy 
targets are taken from the parameters Z_NDQS0 and 
CLASSJJ0QS0, together with the associated statisti- 
cal error estimates Z_ERR_NDQSD and confidence flags 
ZWARNINGJJDQSD, which represent the best-fit redshift 
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TABLE 3 

BOSS DR9 REDSHIFT AND CLASSIFICATION WARNING FLAGS (ZWARNING) 



Bit 


Name 


Definition 





SKY 


Sky fiber 


1 


LITTLE.COVERAGE 


Insufficient wavelength coverage 


2 


SMALL_DELTA_CHI 2 


A% 2 between best and second-best fit is less than 0.01 (or 0.01 X the minimum x 2 ) 


3 


NEGATIVE_MODEL 


Synthetic spectrum negative, disabled for BOSS DR9 


4 


MANY.OUTLIERS 


More than 5% of points above 5-cr from synthetic spectrum, disabled for BOSS DR9 


5 


Z.FITLIMIT 


X 2 minimum for best model is at the edge of the redshift range 


6 


NEGATIVE.EMISSION 


Negative emission in a quasar line at 3-cr significance or greater (see £|3.3|| 


7 


UNPLUGGED 


Broken or unplugged fiber 



and classification determined through the procedure de- 
scribed in £|3.1[ but excluding the consideration of quasar 
template fits. This effectively imposes the red-color 
and extended-image priors of the galaxy target sample 
over the blue-color and point-like image priors of the 
quasar target sample. The SMALL_DELTA_CHI2 bit for the 
ZWARNINGJJOQSO mask is set only on the absolute crite- 
rion of A\r < O.Of relative to the next-best non-quasar 
model (i.e., with no relative A% 2 cut). We recommend 
the use of these "NDQSD" quantities for statistical analy- 
ses of the BOSS galaxy samples. The parameter Z and 
its associated values are also retained and reported for 
consistency with the original SDSS-I/II approach, repre- 
senting the global minimum-^ 2 redshift inclusive of all 
spectral template classes. 

3.3. Parameter measurements 

The primary outputs of the analysis code described in 
this work are the classification, redshift, redshift error, 
and best template-based model fit to each spectrum. The 
code also measures a number of parameters assuming the 
best-fit classification and redshift. Specifically: stellar 
velocity dispersions are measured for objects classified 
as galaxies; emission-line parameters are measured for 
all objects; and supplemental stellar sub-classifications 
and radial velocities are measured for objects classified 
as stars. We now describe these three measurement pro- 
cedures in turn. 

Stellar velocity dispersions o~ v of galaxies are measured 
using a stellar template basis derive d from the ELODIE 
library ([Prugniel fc Soubiranl 1200 J ), covering the rest- 
frame spectral range 4100A-6800A. The high-resolution 
ELODIE spectra are degraded to the binning scale and 
approximate resolution of the co-added BOSS spectra, 
and a PC A of the library is performed. The first 24 



TABLE 4 

Summary of BOSS redshift & classification degrees of 

FREEDOM 





Template 


-Wtcmp, Apoly 


Class 


Filename 


Per Fit 1 


GALAXY 


spEigenGal-55740 .fits 


4, 3 


QSQ 


spEigenQSO-55732 .fits 


4, 3 


STAR 2 


spEigenStar-55734 . fits 


1, 4 


STAR (CV) 


spEigenCVstar-55734 .fits 


3, 3 



"Atemp is number of basis templates per fit; Ap i y is number of 
additive polynomial background terms used in addition to template 
basis in the fit. 

'There are a total of 123 non-CV stellar subtypes considered, but 
each trial fit only includes a single template. 



principal components are used as a basis for fitting the 
galaxy spectra. The entire PCA basis is incrementally 
broadened from to 850 kms -1 in units of 25 kms -1 , and 
the set of all broadened PCA components is cached for 
the analysis of all galaxy spectra. For each galaxy spec- 
trum, the stellar PCA basis is redshifted to match that 
galaxy's redshift. At each trial broadening, the galaxy 
spectrum is fit with an error- weighted least-squares linear 
combination of the broadened stellar PCA basis plus a 
quartic polynomial, while masking the regions surround- 
ing common emission lines. This marginalization over 
stellar-population effects at each trial a v value serves to 
absorb some of the systematic errors of "template mis- 
match" into the statistical velocity dispersion error. The 
X 2 goodness-of-fit statistic for the best model is tabulated 
for each broadening step, to define a X 2 ( (J v) curve. The 
minimum-x 2 velocity-dispersion value (with sub-grid lo- 
calization) is reported as the measured value VDISP, and 
the error on this measurement VDISP_ERR is estimated 
from the curvature of the x 2 function at the position of 
the minimum. Note that this analysis is highly analo- 
gous to the redshift measurement procedure described in 
£13.11 This velocity dispersion measurement algorithm is 
unchanged from SDSS-I/II. 

Within the BOSS data set, the S/N per pixel in galaxy 
spectra is often below the threshold commonly adopted 
as minimally sufficient for accurate point estimation of 
the stella r velocity disper sion. However, it has been 
shown by iShu et all (|2012D that unbiased measurements 
of the distribution of velocity dispersions within a large 
sample of galaxies can be made even when the individ- 
ual spectra are of low S/N, by means of a hierarchical 
analysis that marginalizes statistically over the likeli- 
hoods of all possible velocity-dispersion values for each 
galaxy. To enable such analyses, we also compute and re- 
port the velocity-dispersion likelihood function for each 
galaxy in the vector-valued column VDISP_LNL. This is 
defined by — x 2 (f„)/2 for velocity dispersions <r v from 
to 850 km s -1 in steps of 25 kms -1 . The baseline and 
overall x 2 computation method are the same as described 
above for the measurement of the VDISP point estima- 
tors. However, the VDISP_LNL calculation employs only 
the first five stellar PCA template basis spectra, and also 
marginalizes over galaxy redshift uncertainties. An ad- 
ditional difference is that while the VDISP computations 
are done only for objects with CLASS of galaxy (for con- 
sistency with the SDSS-I/II practice), the VDISP_LNL cal- 
culations are done only for objects with CLASSJJOQSO of 
galaxy (for consistency with BOSS practice). 

Emission-line parameters for the 31 transitions listed 



8 



Bolton et al. 



in Table [S] are computed for all spectra for which those 
lines fall into the observed BOSS wavelength range. Each 
line is modeled as a Gaussian, and the amplitudes, cen- 
troids, and widths of all lines are optimized non-linearly 
to obtain a minimum-^ 2 fit to the data. The background 
continuum spectrum is taken from the best-fit velocity- 
dispersion model (for galaxies), from the best- fit redshift- 
pipeline model (for stars), and from a linear fit to the 
sidebands of each line (for quasars and for ranges of the 
galaxy spectra that extend beyond the coverage of the 
ELODIE-based velocity-dispersion templates.) All lines 
are constrained to have the same rcdshift within the fit, 
with the exception of Lya, which is allowed to fit at a 
different redshift to account for the asymmetric effects of 
Lya forest absorption. In addition, groups of lines are 
constrained to have the same line-width as noted in Ta- 
bled! so as to allow robust fits to the strengths of low-S/N 
emission lines. Hence, the reported line-widths are effec- 
tively a strength- weighted average over the group. An 
initial guess for the line redshifts is taken from the best- 
fit pipeline redshift. Emission-line redshifts are allowed 
to depart arbitrarily from this value, but in practice are 
well-constrained in cases with significant emission in any 
lines. 96% of the quasars with significant Civ emission 
have line fits within 6000 km s -1 of the template red- 
shift, and 96% of the galaxies with significant [On] emis- 
sion have fits within 100 km s -1 . Line fluxes, line widths, 
line redshifts, estimated continuum levels, and observed- 
frame equivalent widths are reported by the line fitting 
code, along with associated errors. In the SDSS-I/II im- 
plementation of the idlspec2d emission-line measure- 
ment code, equivalent widths were measured relative to 
the estimated continuum spectrum at line center, while 
for BOSS DR9 this has been changed to use a continuum 
level estimated from the sidebands of the line. 

Based on the results of the line-fitting code, galaxy 
spectra with emission in all four of the lines H/3, 
[Om] 5007, Ha, and [Nil] 6583 detected at 3-sigma or 
greater are sub-classified into AGN, STARF0RMING, and 
STARBURST according to the following rules. First, galax- 
ies are sub-classified as AGN if 

log 10 ([Oin]/ff/3) > 1.21og 10 ([JVn]/ffa) + 0.22 (1) 

(|Baldwin et al.|[l981[) . For galaxies falling on the other 
side of this cut, sub-classification is made based on the 
equivalent width of Ha: STARF0RMING if less than 50 A, 
and STARBURST if greater. Galaxies and quasars may 
be given an additional sub-classification as BR0ADLINE if 
they have line widths in excess of 200 km s -1 , with line- 
width measurement significance of at least 5-sigma, and 
line-flux measurement significance of at least 10-sigma. 

For spectra classified as st ars, an additional fitting to 
the ELODIE stellar library (jPrugniel fc Soubiranll200l 
is performed. The ELODIE library contains 709 stars 
spanning spectral types O to M, luminosity classes V to 
I, and metallicities [Fe/H] from -3.0 to +0.8. The ob- 
served resolution was 42,000 over the wavelength range 
4100 to 6800 A. Our fitting makes use of the release of 
this library at resolution 10,000 that was calibrated to 
0.5% in narrow-band spectrophotometric precision and 
2.5% in broad-band precision. This library was trimmed 
from 709 to 610 stars that are not binary or triple 
stars. The ELODIE spectra are convolved with Gaus- 



TABLE 5 

Emission lines measured by the BOSS pipeline 



Line 




Line 


rxCQhllllt 


VV lQTJfl 


Wavelength 1 


Name 


Group^ 


Groups 


1215. 


.67 


Lya 


Lya 


Lya 


1240. 


.81 


JNv lz4U 


emission 


Nv 


1549. 


.48 


Civ 1549 


emission 


emission 


1640. 


.42 


Tj„TT i a. a n 
rlcll 104U 


emission 


emission 


1908. 


.734 


/"•ttt! 1 nnQ 

oinj iyuo 


emission 


emission 


2799. 


.49 


AT T T O "fin 

.Vigil 2799 


emission 


emission 


3726. 


.032 


[Oil] 3725 


emission 


emission 


3728. 


.815 


[Oil] o7 27 


emission 


emission 


3868. 


.76 


[INclllJ oobo 


emission 


emission 


3889. 


.049 


He 


emission 


B aimer 


3970. 


.00 


[JNclIIJ o97U 


emission 


emission 


4101. 


.734 


TT X 


emission 


B aimer 


4340. 


.464 




emission 


B aimer 


4363 


.209 


F/~\ttt1 A QtiQ 

[UIIIJ 4obo 


emission 


emission 


4685. 


.68 


tt„ tt ACZQPL 

Hell 4uo0 


emission 


emission 


4861. 


.325 


BP 


emission 


B aimer 


4958. 


.911 


[uiiij 4yoy 


emission 


emission 


5006. 


.843 


[wlllj oUU< 


emission 


emission 


5411. 


.52 


TJ„TT C A 1 1 

Hell o411 


emission 


emission 


5577. 


.339 


oo < 1 


emission 


emission 


5754. 


.59 


I"\Ttt1 K"7KK 

[1NIIJ Ol OO 


emission 


emission 


5875. 


.68 


Not ^&7fi 
n(_ i oo i v 


emission 


emission 


6300. 


.304 


[Oi] 6300 


emission 


emission 


6312. 


.06 


[Sin] 6312 


emission 


emission 


6363. 


.776 


[Or] 6363 


emission 


emission 


6548. 


.05 


[Nil] 6548 s 


emission 


emission 


6562 


.801 


Ha 


emission 


B aimer 


6583 


.45 


[Nil] 6583 s 


emission 


emission 


6716 


.44 


[Sir] 6716 


emission 


emission 


6730 


.82 


[Sir] 6730 


emission 


emission 


7135 


.790 


[Ann] 7135 


emission 


emission 



"Wavelengths are quoted in air for optical transitions and in vac- 
uum for UV transitions below 2000 A. 

'Emission lines of a common "redshift group" are constrained to 
have the same redshift in the line fitting procedure. 

c Emission lines of a common "width group" are constrained to 
have the same intrinsic velocity width in the line fitting procedure. 

d [Om] 5007 and [Om] 4959 are constrained to have a 3:1 line-flux 
ratio. 

c [Nll] 6583 and [Nil] 6548 are constrained to have a 3:1 line-flux 
ratio. 

sian functions to match the resolution of the BOSS spec- 
tra. A later release of this library (ELODIE 3.1) was 
not used due to extensive masking of regions near sky 
emission that compromises its use for measuring radial 
velocities to high precision (|Prugniel et al.l [20071 ) . Each 
BOSS spectrum classified as a star is re-fit to all spec- 
tra in this trimmed ELODIE library with the identical 
redshift-fitting code used to determine the primary red- 
shift (H23). These fits are limited to the 4100-6800 A 
wavelength range, include 3 polynomial terms, and span 
velocities from —1000 to +1000 km s -1 . The physical pa- 
rameters of the best-fit ELODIE template are included 
in the pipeline outputs (EL0DIE_TEFF, ELDDIE_LDGG, 
ELDDIE_FEH), along with the redshift (ELDDIE_Z), statis- 
tical error of the redshift (EL0DIE_Z_ERR) and reduced x 2 
of that fit (EL0DIE_RCHI2). An estimate of the template- 
mismatch effects on the redshift is provided as the stan- 
dard deviation in redshift among the best 12 ELODIE 
template fits (ELDD IE_Z_MDDELERR) . 
The BOSS pipeline also computes and reports me- 
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TABLE 6 

DR9 Redshift and classification pipeline output files 1 

spZbest-pppp-mmmmm.f its Best-fit redshift & class param.s 

spZall-pppp-mmmmm . f its Parameters for all fits 

spZLine-pppp-mmmmm.f its Emission-line parameters 

spAll-v5_4_45 . f its Summary param.s for all spectra 

spAHLine-v5_4_45 . f its Line fit param.s for all spectra 



"The strings pppp and nrauiraun represent the 4-digit PLATE and 5- 
digit MJD identifiers for files that are created on a plate- by-plate ba- 
sis. The string v5_4_45 denotes the frozen version of the idlspec2d 
software used for the processing of the DR9 spectroscopic data 
sample. Full documentation of these and other pipeline output 
files are found at http://www.sdss3.org/dr9/ 

dian spectroscopic signal-to- noise ratios per 69kms _1 
pixel (SN_MEDIAN) ove r the five SPSS broa dband wave- 
length ranges (ugriz, iFukugita et al.l fl996l) . along with 
the synthetic broadband fluxes predicted by the spec- 
trum (SPECTRDFLUX) and the best-fit template model to 

the spectrum (SPECTROSY NFLUX ) 

As described in lAhn et all ()2012f ). DR9 also in- 
cludes catalogs of alternative parameter measurements 
for BOSS gal axies, which are documented in other 
publications. iChen et all (|2012D describe PCA-based 
stellar-population para meter measurements a nd velocity- 
dispersion estimates. iThomas et al.l (|2012j ) have mea- 
sured s tellar velocity dispersions usin g the pPXF soft- 
ware of iCappellari fc Em scllcm (2004]) and e mission-line 
prope rties usin g the GANDALF softw are of iSarzi et al.l 
(|2006lL Finally. iMaraston et all (|2012D have derived pho- 
tometric stellar-mass estimates for BOSS galaxies. All 
these measurements are distributed with DR9, but are 
separate from the core idlspec2d pipeline system de- 
scribed here. 

3.4. Output files 

The BOSS idlspec2d redshift pipeline generates out- 
put files for each plate, along with summary files to ag- 
gregate photometric and spectroscopic parameters across 
the entire BOSS survey data set. These files are listed 
in Table [6J together they contain all the parameters de- 
scribed in this paper. Access to these files on the SDSS- 
HI Science Archive Server (SAS), as well as full data- 
model documentation of their formats and contents, can 
be obtained through the SDSS-III DR9 website. The 
spAll summary file from the BOSS pipeline is analogous 
but not identical in form and content to the specObj file 
loaded by the SDSS-III Catalog Archive Server (CAS), 
which contains both SDSS-I/II and BOSS data. 

Approximately 8% of BOSS spectra are repeat obser- 
vations of previously observed targets, due both to re- 
observations of entire plates and to re-targ eting of a num- 
ber o f objects on more than one plate (see lDawson et aLI 
2012). Of particular note within the summary files, the 
best spectroscopic observation of each object (defined by 
a 2" positional match) in the survey is defined according 
to the following rules: 

1. Prefer spectra with positive median S/N per spec- 
troscopic pixel within the r-band wavelength range 
over other observations. 

2. Prefer spectra with ZWARNING = over other spec- 



tra (or ZWARNINGJJDQSO = for galaxy-sample tar- 
gets.) 

3. Prefer spectra with higher median S/N per spectro- 
scopic pixel within the r-band wavelength range. 

The best observation for each object is designated by set- 
ting the parameter SPECPRIMARY equal to 1 in the spAll 
file, while setting it equal to zero for all other spectro- 
scopic observations of a given object that may be present 
within the survey data set. 

4. TEMPLATE CLASSES 

In order to compare and select among galaxy, quasar, 
and stellar models objectively and with the highest sta- 
tistical significance, the BOSS pipeline requires redshift 
and classification measurement templates that span both 
the full space of physical object types within the sur- 
vey and the full wavelength range of the spectrograph. 
BOSS expands on SDSS-I/II in both regards, and hence 
requires a new set of pipeline templates, which we now 
describe. 

4.1. Galaxies 

The idlspec2d galaxy redshifts for SDSS-I/II were 
measured using templates generated from 480 galaxies 
observed on SDSS plate 306, MJD 5 1 690. 24 Redshifts 
for this training set were established by modeling each 
spectrum across a range of trial redshifts as a linear com- 
bination of (1) the leading two components of a PC A 
analysis of 10 velocity-standard stars in M67 observed 
by SDSS-I plate 321 on MJD 51612, (2) a set of com- 
mon optical emission lines modeled as narrow Gaussian 
profiles, and (3) a low-order polynomial. The adopted 
redshift for each galaxy was taken from the location of 
the minimum-x 2 value localized to s ub-grid accuracy, in 
the same manner described in £13. II above. Using these 
redshifts, the training-sample spectra were transformed 
to a common rest-frame baseline, and input to an it- 
erative PCA procedure that accounts for measurement 
errors and missing data (e.g.. iTsalmantza fc Hoggll2012l 
and references therein) . The leading four "eigenspectra" 
from this procedure were taken to define the galaxy red- 
shift template basis for SDSS-I/II. For the commission- 
ing analysis of BOSS spectra, these same templates were 
used for measuring galaxy redshifts, despite their lack 
of z — coverage redward of 9300 A and their under- 
representation of post-starburst galaxies (which appear 
with more frequency in the BOSS CMASS sample than 
in SDSS-I/II). 

To generate a new redshift template set for use in auto- 
mated analysis of BOSS spectra, we select a set of BOSS 
galaxies with redshifts over the interval 0.05 < z < 0.8 
that are well-measured by the original SDSS templates. 
To increase S/N and flatten the coverage of galaxy pa- 
rameter space before performing a PCA to generate the 
template set, we bin toget her galaxies with sim ilar 4000 A 
break strengths (£>4000, iBalogh et all I1999T ) and red- 
shifts, for the purposes of stacking their spectra. We use 
a D4000 range from 1.0 to 2.2 with a binning interval 
of 0.2, and a redshift binning interval of 0.05. In some 

24 These spectra are tabulated in the file eigeninput_gal.dat 
within the templates subdirectory of the idlspec2d product. 
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Fig. 3. — BOSS redshift and classification template basis sets for 
galaxies (top), quasars (middle), and CV stars (bottom). 

Z)4000-redshift bins, we further subdivide the galaxies 
into several HS A sub-bins. The number of sub-bins de- 
pends on the number of galaxies in each ZMOOO-redshift 
bin: if the total number is smaller than 600, we do not 
divide further into USa sub-bins; if the number is in the 
range 600-1200, we divide into two sub-bins; and if there 
are greater than 1200, we divide into three sub-bins. 

We also select a set of "post-starburst" galaxies from 
the BOSS galaxy sample, defined by having either 

D4000 < 1.3 and (H5 A + H lA )/2 > 7 (2) 

or 

(HS A + H-/ A )/2 >max[-17.50xD4000+29.25, 3]. (3) 

This criterion leads to a sample of about 2400 post- 
starburst galaxies, which we divide into five bins in red- 
shift with equal numbers of galaxies per bin. We then 
stack the rest-frame spectra of all galaxies in each bin, 
to generate a set of high-S /N stacked spectra across the 
range of parameters indicated. 

Once all these stacked spectra ar e in hand, we fit 
stella r continuum models t o them (jBrinchmann et al.l 
120041 iTremonti etai] [2004) using simple stellar pop- 

ulati on (SSP) mode ls. Our SSP models are taken 

from Mara ston et al.1 (|2009f ) andlMarasto n fc Strombackl 
d201l|), and are based on a combination of the- 
oretical and observational st e llar library data from 
iRodriguez-Merino et al.l (120051 ) iSanchez-Blazquez et al.l 
(|2006D. and iGustafsson et al.l ([20081). In the rest-frame 
wavelength range 1900-9900A, we patch the stacked 
spectra with the fitted continuum models for pixels with 
S/N smaller than 10, pixels where the difference between 
models and stacks is larger than 30%, and pixels where 
there are no observations. We also add Ha, [Nil], and 
[Sn] emission lines for cases where these lines fall out- 



side the range of observed wavelengths used to gener- 
ate the stacked spectra. This is accomplished by se- 
lecting galaxies with similar D4000, H6 A , and dust ex- 
tinction as the galaxies used to make the stacks, and 
computing the median values of <Jn a /au/3, ftia/fnp, 
a [Nil] / c [oni] i /[Nil] //[oiii] > a [sii] / c [oni] j an( i /[SII] / /[OIII] 
for these comparison samples (here, a is Gaussian line 
dispersion and / is line flux). By multiplying these ra- 
tios with the appropriate line width or flux of H/3 or [Om] 
from the stacks, we predict the line widths and fluxes for 
Ha, [Nil] and [Sn] to be added to the stacked spectra, 
which we do using a Gaussian model for each line. 

At the end of this process, we have 160 stacked and 
patched spectra. We augment t hese data with a sam- 
ple of 28 type-II quasars (e.g., iZakamska et al.l 120031 : 
iReves et alll2008f) identified within the BOSS spectro- 
scopic data set (see the discussion in Sj5J). This full set of 
spectra is then used as input to the rest-frame spectrum 
PCA algorithm to generate the four-component BOSS 
galaxy redshift template basis, which is shown in the top 
panel of Figure OH 

4.2. Quasars 

Quasar redshift templates are generated from a train- 
ing sample of targets selected from the SDSS DR5 
quasar catalog ([Schneider et al.l 120071 ) and targeted for 
re-observation with the BOSS spectrographs. The tar- 
gets were chosen from the catalog at random, while en- 
forcing as uniform a distribution as possible in redshift. 
As of 2011 June 10, 571 objects from this sample had 
been observed by BOSS. Removal of three spectra for 
localized cosmetic defects gives a training sample of 568 
BOSS quasars. The distribution in redshift of the tar- 
geted sample and the observed sample is shown in Fig- 
ure [4j The observed sample is weighted more heavily 
above redshift z — 2.2, in accordance with overlapping 
BOSS quasar sample priorities. We keep this weighting 
in the training set, since we want our redshifting perfor- 
mance to be particularly well tuned for the redshift range 
of interest to the BOSS Lya forest program. 

Using the redshifts given bv lSchneider et all (|2007f ) , we 
shift these training spectra to their rest frames and per- 
form a PCA of the sample, with iterative replacement to 
fill in missing data. The top four principal components 
are retained and used as the linear basis set for our auto- 
mated redshift and classification measurements, and are 
shown in the middle panel of Figure [31 

We do not empl oy the redshift estimates of 
IHewett fc Will ff2T7Toh for the quasar-template train- 
ing sample because the current BOSS pipeline is not 
configured to incorporate the absolute-magnitude in- 
formation that would be necessary to take advantage 
of the increased precision afforded by these redshifts. 
Future BOSS pipelin e versions may incorporate the 
IHewett fc Wild! (|2010l ) approach. We note that the pri- 
mary criterion for BOSS spectroscopic pipeline perfor- 
mance on quasar targets is to minimize catastrophic red- 
shift failures. Several detailed approaches to maximizing 
quasar redshift precision are being investigated within 
the BOSS quasar science working group, but all of these 
rely on having essentially correct initial quasar redshifts 
from the i dlspec2d pipelin e and/or visual inspection 
procedures ([Paris et al.ll2012f ) . 
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Fig. 4. — Redshift distribution of 1000 targeted (gray) and 571 
observed (black) quasar training spectra. Spectra from the ob- 
served distribution are used to construct the PCA-based quasar 
redshift templates used for automated classification and redshift 
measurement in BOSS DR9 and shown in the middle panel of Fig- 
ure \3\ 

4.3. Stars 

Although stellar science is not a primary goal of BOSS, 
the redshift pipeline must successfully flag stars from 
within the galaxy and quasar target samples of the sur- 
vey. There is currently no comprehensive library of ob- 
served stellar spectra covering the full usable wavelength 
range of the BOSS spectrograph and the full H-R di- 
agram. To assemble a set of stellar templates suitable 
to BOSS spectrum classification, we use a hybrid ap- 
proach that extends data fro m the Indo-US obs erva- 
tional stellar spectrum library (IValdes et al.ll2004l ). se- 
lected to provide uniform coverage of the space of stel- 
lar atmosphere parameters T e g, log g, and [Fe/H]) us- 
ing theor etical atmosphere mod els computed using the 
MARCS dGustafsson et al.l l2008. for cool stars), ATLAS 



Kuruczl 120051 for int ermediate stars), and CMFGEN 



Hillicr & Miller 1991, for hot stars) code s, obtained via 



the c urated POLLUX spectrum database (jPalacios et al.1 

[2oToh . 

4.3.1. Template spectrum creation 

We start with the full database of 1273 Indo-US stel- 
lar spectra, which have a resolution of approximately 
1 A, a reduced pixel scale of 0.4 A, spectral coverage over 
3400 < A < 9500, and good flux calibration for most stel- 
lar types. The original radial- velocity zeropoints for the 
library were established either from literature or from 
velocity-standard cross-correlations. Since classification 
is the primary function of these spectra within the BOSS 
pipeline, we do not attempt any further refinement of 
these velocity zeropoints. 

We initialize a "bad pixel" mask for each Indo-US 
spectrum based upon the zero-value Indo-US pixel mask 
convention. Furthermore, we define the following tel- 
luric absorption bands, and mask all pixels within them: 
6850A-6950A, 7150A-7350A, 7560A-7720A, 8105 A- 
8240 A, >8900A. We then select the subset of spectra 
that meet the conditions of (1) wavelength coverage from 
at least 3500 A to 8900 A, (2) good data over at least 75% 
of their pixels, (3) flux calibration with a non-flat (i.e., 
stellar) standard, and (4) no single gap within the spec- 
trum larger than 200 A (the largest adopted telluric band 
width). These cuts result in a sample of 879 spectra cov- 
ering spectral types from 06 to M8, but exclude carbon 
stars (which are fluxed with a flat SED in the Indo-US 



library) . 

We then take the 1040 model atmosphere spectra from 
the POLLUX database ranging in temperature from 
3000 K to 49000 K, convolved and binned to the reso- 
lution and sampling of the Indo-US spectra. For each 
Indo-US spectrum in our subset, we loop over all model 
atmospheres and determine the multiplicative scaling of 
the model that minimizes the sum of squared data-minus- 
model residuals over non-masked pixels. We adopt the 
model spectrum that gives the overall minimum sum of 
squared residuals as being the "best fit" for a particular 
data spectrum. 

The "best fit" model spectrum for each data spectrum 
is used to extend the data wavelength coverage and inter- 
polate over the data gaps as follows. We define a running 
window of ±400 pixels (±160 A) about an output pixel of 
interest, and determine the multiplicative scale and tilt 
to apply to the model over that window in order to give 
the best (least squares) fit to the non-masked data pix- 
els over that same range. The scaled and tilted model 
is evaluated at the central pixel to define the new, lo- 
cally scaled model spectrum, and the process is repeated 
over the entire spectrum by sliding the window. For pix- 
els centered outside the outermost pixel of data coverage 
on the red and blue ends, the scale and tilt at the outer- 
most data-covered pixel are used. The data and "sliding- 
scaled" model spectra are combined into a single out- 
put spectrum by assigning 100% model in pixels where 
the data have no coverage, defining a 100-pixel (40 A) 
transition region on either side of data gaps where the 
output spectrum is a weighted combination of the data 
and the sliding-scaled model, and varying the weight lin- 
early from 0% model + 100% data to 100% model + 
0% data over the transition region. Finally, we convolve 
and bin these output spectra down to the typical resolu- 
tion (about 3 A FWHM) and reduced-spectrum sampling 
(Alog 10 A = 0.0001 per pixel) of the BOSS data, also 
transforming from air to vacuum wavelengths to match 
the BOSS spectrum convention. 

4.3.2. Archetype subset selection 

From these 879 patched and extended stellar spec- 
tra, our goal is to select a representative subset of 
"archetypes" that provide sufficient coverage of stellar 
parameter space to perform automated spectroscopic 
star-galaxy and star-quasar separation, while not at- 
tempting overly detailed stellar analysis that is beyond 
the s cope of the BOSS science mission (cf. iLee et all 
l2008h . 

We first visually inspect the template database and 
remove a single spectrum with noticeable data quality 
issues in an unmasked data region (Indo-US ID#33111, 
5450 A < A < 6000 A). We also select the 12 template 
spectra that have significant emission lines, and retain 
each of them for our final archetype set. This leaves 
866 spectra from which to select the remainder of our 
archetype sample. 

To select a subset of archetypes from the remaining set 
of templates, we wish to make use of a measure of the de- 
gree of similarity or difference between any two spectra. 
We first restrict our attention to the wavelength range 
3400A-11000A, corresponding to A pix = 5099 pixels at 
the processed 69kms _1 BOSS spectrum pixel scale. We 
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then re-normalize all the template spectra to satisfy 

AW 

E f? = N *>* > ( 4 ) 

i=l 

where fa is the flux density (in the fx sense) in pixel i. We 
define a statistic s 2 measuring the quality of spectrum /', 
scaled by a factor a, as a model for spectrum /: 

AW 

« 2 =E(/i-a//) 2 . (5) 

With our normalization convention, the best-fit 
(minimum- s) scaling is simply given by 

JVp.x 

Obcst = N£ Yl fif'i > ( 6 ) 
i=l 

and the value of s 2 at this best scaling is given by 

Sbcst = Npix(l - Obcst) ■ (7) 

Note that abest and s^ est are symmetric under the inter- 
change of / and /': the amplitude and fit quality of one 
template to another does not depend upon which one is 
taken as the "data" and which one as the "model" . Thus 
s bcst can ^> e regarded as a measure of how different two 
templates are from one another. 

We compute the matrix of s 2 , t between all pairs of 
templates in our set, and determine our archetype list in 
an iterative procedure. We set a threshold of 7.5 for the 
maximum finest allowable between two spectra in order 
for one spectrum to be an acceptable representative for 
the other. This threshold was selected heuristically to 
tune the size and diversity of the final sample. We then 
identify the single template spectrum within the sample 
that has the most s^ est < 7.5 matches to the rest of 
the sample. This spectrum and all the spectra that it 
matches are removed from further consideration, and the 
process is iterated until all spectra have been accounted 
for in this manner. For our chosen threshold, this process 
identifies 105 archetypes out of 866 analyzed templates. 
When added to the 12 emission-line templates, this yields 
a set of 117 stellar templates for our automated spectrum 
classification algorithm. Spectra fit by these templates 
are tagged with the stellar subclass listed in the Indo-US 
database, along with the library identification number of 
the archetype spectrum. 

4.3.3. Special stellar subclasses 

Several subclasses of star appear with some frequency 
in the BOSS target sample, but are not represented in the 
(flux-calibrated) Indo-US library. For these subclasses, 
representative training samples from within the BOSS 
data set are identified based upon classification using 
SDSS-I/II stellar templates. New templates are derived 
by averaging the spectra of these training sets within 
a PCA framework. The six subclasses and the number 
of training spectra for each of them are: (1) 47 carbon 
stars; (2) 50 "hotter" white dwarfs with u — g < 0.3; (3) 
50 "cooler" white dwarfs with u — g> 0.3; (4) 19 calcium 
white dwarfs; (5) 31 magnetic white dwarfs; and (6) 50 



L dwarfs. In addition, a sample of 18 cataclysmic vari- 
able stars (CVs) with prominent emission lines is used 
to define a CV star eigenbasis of 3 PCA modes, which 
is shown in the bottom panel of Figure [3J Because of 
the use of multiple eigenvectors rather than a single av- 
erage spectrum, CV stars are treated as an object class 
separate from other stars in the automated classification 
analysis. 

5. PERFORMANCE AND VERIFICATION 

Table [T] provides a summary of the BOSS DR9 spec- 
troscopic data set analyzed by the redshift and clas- 
sification pipeline described in this work, along with 
a number of summary performance statistics that we 
now examine. Additional checks on the idlspec2d 
pipeline performance for galaxy targets in compar- 
ison with the zc ode cross-corr e lation redshift soft- 
wa re described by ICannon et al.l (|2006j) are presented 
in IDawson et al.l (|2012D . and additional discussion of 
pipeline qu asar classifi c ation and redshift performance 
is found in lParis et all (|2012| ). The BOSS DR9 sample 
contains 831,000 spectra. Of these, about 0.2% are lost 
to unplugged fibers and spectra falling along bad CCD 
columns. Approximately 92% of the BOSS DR9 spec- 
tra are of unique objects (as defined by a 2" positional 
match) . The remaining 8% are repeat spectra from over- 
lapping plates or repeat observations of the same plate. 

5.1. Galaxy redshift completeness and purity 

Using the ZJJ0QS0 redshift measurement convention 
as described in £13.21 we achieve an automated com- 
pleteness (i.e., ZWARNINGJJ0QS0 == rate) of 98.7% for 
the CMASS sample and 99.9% for the LOWZ sample 
(from Tabled]). Restricting further to objects that are 
spectroscopically classified as galaxies (CLASSJJ0QS0 == 
"GALAXY"), we find combined targeting and measure- 
ment completeness percentages of 95.4% for CMASS and 
99.2% for LOWZ. These percentages satisfy the BOSS 
science requirement of at least 94% overall galaxy red- 
shift success. For the CMASS sample, about 70% of 
the (small) survey inefficiency is due to targeting stars 
and star-galaxy superpositions rather than galaxies, and 
about 30% arises from known redshift measurement fail- 
ures. 

To verify the completeness and quantify the purity of 
the automated galaxy redshifting and classification, we 
make use of a "truth table" generated by the first au- 
thor from the visual inspection of 4864 galaxy spectra 
taken on eight plates observed during 2010 March. 25 We 
focus primarily on the CMASS sample, as this higher- 
redshift (and thus lower S/N) sample poses the greatest 
challenge to the software. Of the inspected spectra, 3666 
are CMASS targets that are above the fiber-magnitude 
threshold, not unplugged, and not falling on bad CCD 
columns. From among these 3666 galaxy-sample spec- 
tra, 3627 have confidently measured pipeline redshifts 
and classifications, giving an automated completeness of 
98.9%, consistent with the completeness of the full DR9 
CMASS sample from above. Of this subset, 3500 are clas- 
sified as galaxies (as opposed to stars) by the pipeline, 
giving a 95.5% overall sample completeness including 

25 The plates are: 3804 of MJD 55267; 3686, 3853, and 3855 of 
MJD 55268; and 3687, 3805, 3856, and 3860 of MJD 55269. 
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Fig. 5. — Histograms of rcdshift differences of LOWZ (left) and CMASS (right) galaxies that are observed more than once, scaled by the 
quadrature sum of statistical error estimates in each epoch. Over-plotted are the best-fit Gaussian models, with a dispersion parameter of 
a = 1.34 for the LOWZ sample and a = 1.19 for the CMASS sample. 
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Fig. 6. — Distribution of estimated statistical galaxy redshift errors for LOWZ (left) and CMASS (right) sample galaxies over multiple 
redshift ranges. The horizontal axes at bottom indicate raw error estimates; the gray horizontal axes at top indicate errors rescaled by the 
factors illustrated in Figure [5] 



target-selection efficiency, which is also consistent with 
the sample-wide value. 

To quantify the purity of the CMASS spectroscopic 
redshift sample, we first search for "catastrophic" impu- 
rities in the CMASS redshift sample, defined as spectra 
for which the pipeline reports a confident galaxy classi- 
fication and redshift, but for which the visual inspection 
yields a confident classification (of any class) with a red- 
shift that differs by greater than Az = 0.005. This search 
yields three such spectra out of 3500: two are definite 
galaxy-M-star superpositions, and the other is a possi- 
ble galaxy-galaxy superposition (for which the pipeline 
rcdshift is more convincing in retrospect than the in- 
spection redshift). We next check for less clearly defined 
impurities, defined as spectra for which the pipeline re- 
ports a confident galaxy classification and redshift, but 
for which the visual inspection does not yield a confi- 
dent result. This search identifies 10 such spectra, six 
of which are plausible pipeline redshifts with subjective 



visual judgments of low S/N, and the remaining four of 
which are due to artifacts associated with spectrum com- 
bination across the spectrograph dichroic at 6000 A (see 
Item [3] in Q . Taking the three superposition spectra 
and the four artifact spectra as genuine contaminants, 
we find a CMASS sample impurity rate of about 0.2%, 
satisfying the 1% maximum catastrophic redshift failure 
rate specified as the scientific requirement for BOSS. 

To check for the possibility of recoverable incomplete- 
ness, we examine CMASS spectra for which the visual in- 
spections yield a confident galaxy classification and red- 
shift, but for which the automated pipeline yields cither 
no confident result (i.e., ZWARNINGJJ0QS0 > 0), or a clas- 
sification as a star. There are 26 such spectra, which 
break down as follows: 11 low-S/N spectra for which 
the pipeline's lack of confidence is statistically defensi- 
ble; 5 definite or possible galaxy-galaxy superpositions; 
4 definite or possible star-galaxy superpositions; 3 spec- 
tra with artifacts; 2 broadline AGN mistaken for stars 
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(but with correct quasar-class redshifts that are excluded 
by the Z_N0QS0 convention); and 1 narrow-line AGN for 
which the pipeline confuses [Oin] 5007 and Ha. Taking 
the 11 noisy but visually convincing redshifts and the 
three AGN spectra to represent the recoverable sample, 
we find an excess incompleteness of about 0.4% relative 
to the maximum attainable given the data. 

To further assess the effects of star-galaxy superposi- 
tions (for which the pipeline takes no special approach), 
we search a set of 57910 CMASS spectra from 150 plates 
for instances of a best-fit non-quasar class of GALAXY 
and a next-best non-quasar class of STAR, and exam- 
ine these spectra visually for the presence of significant 
stellar features. From this sample, we find 103 possible 
and 58 probable star-galaxy superpositions, indicating a 
total CMASS star-galaxy superposition rate of between 
0.1% and 0.2%. These star-galaxy superpositions that 
are given a spectroscopic class of GALAXY are a source 
of sample impurity, as the galaxies are typically neither 
bright enough nor of the correct color to fall within the 
CMASS color-magnitude selection cuts on their own. 
Any star-galaxy superpositions classified as STAR by the 
pipeline are excluded from the large-scale structure anal- 
ysis and contribute only to target-selection inefficiency. 

Our visual inspection set also contains 568 LOWZ 
galaxies brighter than the fiber-magnitude threshold. All 
of these spectra are confidently classified and redshifted 
by both the pipeline and the visual inspection, with three 
classified as stars. This is consistent with the automated 
completeness and stellar contamination rate for the full 
LOWZ sample, with no detectable incidence of catas- 
trophic failures. 

5.2. Galaxy redshift precision 

Redshift errors are calculated from the curvature of the 
X 2 function in the vicinity of the minimum value that is 
used to determine the best-fit redshift measurement. To 
assess the accuracy of these statistical error estimates, 
we make use of a set of 27170 repeat observations of 
CMASS targets and 7503 repeat observations of LOWZ 
targets within the DR9 data set. We reference all re- 
peat observations to the SPECPRIMARY observation of a 
given object, and scale the redshift difference between 
the two observations by the quadrature sum of the error 
estimates from the two epochs. We then construct a his- 
togram of these scaled velocity differences and fit it with 
a Gaussian function. If the estimated errors accounted 
for all the statistical uncertainty, these fitted Gaussians 
would have a dispersion parameter of unity. Figure [5] 
shows the results of this analysis, with fitted dispersions 
of (7 = 1.34 for the LOWZ sample and a = 1.19 for the 
CMASS sample. Thus, while slightly underestimated, 
the redshift errors are impressively close to being sta- 
tistically accurate. The greater scatter (relative to the 
statistical error estimates) for the LOWZ sample sug- 
gests that systematic effects become more important at 
higher S/N. 

This analysis of repeat spectra also yields 44 CMASS 
re-observations that have absolute redshift differences 
|Az| > 0.005 between the two epochs. These are pri- 
marily due to galaxy-galaxy superpositions at distinct 
redshifts, un-masked spectrum artifacts, and a number 
of type II quasars for which broad [Oni] 5007 emission 
is confused with Ha in one epoch but not the other (see 
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Fig. 7. — BOSS CMASS sample redshift and classification fail- 
ure rate (i.e., ZWARNING.NOQSO > 0) as a function of median spec- 
troscopic S/N within the SDSS r (solid), i (dashed), and z (dot- 
dashed) bandpass regions of the spectrum. 

Item H in ©. The implied 0.16% CMASS redshift im- 
purity rate is consistent with the value found from the 
truth-table tests of £15.11 For the LOWZ repeat observa- 
tions, two spectra yield |Az| > 0.005, both of which are 
galaxy-galaxy superpositions. 

For all CMASS and LOWZ targets, we also compute 
the distribution of estimated redshift errors as a func- 
tion of redshift. These distributions are shown in Fig- 
ure^ In all cases, typical errors are a few tens of kms -1 
even when scaled up to reflect the super-statistical scat- 
ter displayed in Figure [5] These errors are well below the 
300 km s -1 redshift precision requirement of the BOSS 
galaxy large-scale structure science analyses. 

5.3. Galaxy redshift success dependence 

As in any redshift survey, spectroscopic S/N is the pri- 
mary determinant of redshift success in BOSS. Figure [7] 
shows the dependence of the CMASS galaxy redshift 
failure rate as a function of the median spectroscopic 
signal-to-noise ratio over the SDSS r, i, and z bandpass 
ranges, which represent the most relevant regions of the 
spectrum for measuring continuum redshifts of passive 
galaxies over the redshift interval z « 0.4-0.8. Failure 
is defined in the sense of ZWARNINGJJ0QS0 > 0, so that 
targets confidently identified as stars are counted as a 
success for the pipeline even though they represent a 
failure in the larger sense of galaxy targeting and red- 
shift measurement. We see a decrease in the failure rate 
as a function of r-band S/N up to S/N r ~ 3, where 
an asymptotic minimum of rs 5 x 10~ 3 is reached. For 
CMASS spectra with S/N r = 3, the typical value for 
both S/N» and S/N z is approximately 6, consistent with 
the S/N values in those bands at which the asymptotic 
failure rate is reached in Figure [7J 

Galaxy magnitude correlates strongly with spectro- 
scopic S/N and hence with redshift success: this is the 
motivation for the formal CMASS sample limit of i-band 
magnitude brighter than 21.5 within a 2"-diameter BOSS 
fiber. To gauge the dependence of redshift completeness 
on this limit, Figure [5] shows the CMASS sample red- 
shift failure rate as a function of ifiber, selecting the best 
single observation of each target. Targets fainter than 
ifibcr = 21.5 are available from a more permissive CMASS 
cut applied during commissioning observations. At the 
formal CMASS cutoff, the marginal failure rate is about 
7%. 

The characteristics of the BOSS spectrograph optics 
and CCD detectors produce a weak dependence of red- 
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Fig. 8. — CMASS sample failure rate as a function of appar- 
ent (not extinction-corrected) i-band magnitude within the 2"- 
diameter BOSS fiber, using the best single spectroscopic obser- 
vation of each CMASS target in the DR9 data set. Vertical dashed 
line at 21.5 indicates the nominal fiber-magnitude faint limit of the 
CMASS sample. 

shift success rate on fiber identification number along 
the linear spectrograph slit-heads. Figure M presents this 
dependence for the CMASS sample. This figure is gen- 
erated only for targets brighter than the ifibcr < 21.5 
cut, but includes all survey spectra of each target (i.e., 
no SPECPRIMARY cut) so as to give an unbiased picture of 
performance versus fiber number. The upturns near fiber 
numbers 1, 500, and 1000 are associated with the edges of 
the spectrograph camera fields of view, and are described 
further in Item @] in fjHl below. The effects of isolated 
bad CCD columns are also evident, and are described in 
Item[5]in £|6] The failure rate is slightly higher on average 
for fibers above 500, corresponding to a lower end-to-end 
survey-averaged throughput for the optics and CCDs of 
spectrograph 2 as compared to those of spectrograph 1. 

In principle, variations in the quality of sky foreground 
subtraction can also affect spectroscopic redshift success. 
In practice, we do not see this effect in BOSS. Fig- 
ure [10] shows the spectrum of systematic sky-subtraction 
residual flux measured from the sky-subtracted blank- 
sky fibers of a representative BOSS plate, calculated by 
subtracting statistical spectrum pixel error estimates in 
quadrature from the root-mean-square (RMS) residual 
spectrum across all sky fibers on the plate. At the posi- 
tions of bright OH air-glow lines, the systematic residu- 
als are generally at or below 1% of the sky flux. To test 
whether the redshift failure rate is affected significantly 
by variations in sky-subtraction quality, we quantify the 
level of residual flux from the sky-subtraction process in 
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Fig. 9. — CMASS sample failure rate as a function of fiber num- 
ber. Generated for all spectroscopic observations of CMASS sam- 
ple targets with iflber < 21.5. Large-scale structure is due to spec- 
trograph camera optics, and small-scale peaks are associated with 
bad CCD columns. 
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Fig. 10. — Systematic sky-subtraction RMS residual spectra 
(black line) computed from sky-subtracted blank sky fibers of a 
representative BOSS plate. Estimated statistical errors have been 
subtracted in quadrature from RMS residual flux at each wave- 
length. Also shown is the median sky flux spectrum scaled down 
by a factor of 100. 



each plate as the RMS flux in all sky-subtracted blank- 
sky fibers over the wavelength range 8300A to 10400A, 
where the effects of OH air-glow lines are particularly 
pronounced. Figure [TTJ displays the results of this test, 
with RMS residual flux expressed both in units of esti- 
mated statistical significance and in units of specific flux. 
In both cases, there is no discernible correlation between 
sky-subtraction residual scale and redshift failure rate. 
The two conclusions we draw are that (1) the quality of 
BOSS sky subtraction is uniformly high, and (2) residual 
variations in the quality of this sky subtraction do not 
significantly affect redshift measurement for the passive, 
continuum-dominated CMASS galaxies. 
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Fig. 11. — CMASS sample redshift and classification failure rate 
versus RMS residual flux in sky-subtracted sky fibers, for BOSS 
plates with at least 300 CMASS galaxy sample targets. Each point 
represents one PLATE-MJD. The top panel abscissa is in units of 
statistical significance, while the bottom panel is in units of spe- 
cific flux. No correlation is seen. RMS significance values of less 
than "1-sigma" reflect unaccounted pixel-to-pixcl correlations in- 
troduced by the re-binning and co-addition of spectra. 
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Fig. 12. — Distribution of error-scaled redshift differences between repeat observations of BOSS quasars (left), and distribution of 
estimated single-epoch statistical quasar redshift errors for multiple redshift ranges (right). True quasar redshift errors are likely dominated 
by systematic effects not reflected here. The upper and lower horizontal axes in the right-hand plot are as in Figure [5] 



5.4. Quasar redshift success 

Unlike the BOSS galaxy samples, the BOSS quasar 
sample does not have a stated requirement on auto- 
mated classification and redshift success. The entire 
quasar target sample is being manually inspected to pro- 
vide a catalog of visuall y verified classifications and red- 
shifts ([Paris et al.l l2012). for which the automated BOSS 
pipeline redshifts provide the initial default value. From 
Table [TJ we find that the idlspec2d pipeline reports a 
confident classification and redshift (i.e., ZWARNING == 
0) for about 79% of the unique spectra of the BOSS 
quasar target sample. The majority of the remaining 
21% of the quasar sample observations are low-S/N spec- 
tra of faint targets. Approximately 51.5% of the unique 
observed quasar sample targets are spectroscopically con- 
firmed as quasars; most of the confidently classified non- 
quasar spectra are stars (typically of spectral type F) 
occupying the same region of color space as quasars in 
the targeted redshift range. However, only 33.6% of the 
unique target sample are confirmed quasars at the red- 
shifts 2.2 < z < 3.5 which are the fo cus of the BOSS Lya 
forest analysis (jDawson et al.l [20121) . Figure [TBI presents 
the spectroscopic confirmation rate for quasars in this 
redshift range 2.2 < z < 3.5, as a function of median 
S/N per pixel over the g-band wavelength range. 




2 4 6 8 10 12 14 

Medion S/N per 69 km s _1 coadded pixel in g bond 

Fig. 13. — Spectroscopic confirmation rate of quasars with red- 
shift 2.2 < z < 3.5 from among BOSS quasar sample targets, as a 
function of median (/-band S/N per spectroscopic pixel. 



The full comparison of visual redshifts and pipeline 
redshifts for BOSS quasar-sample targets is presented in 
I Paris et al.l (|2012[ ). and is beyond the scope of this cur- 
rent work. We note two particular statistics here. First, 
the visual inspections provide a 1.7% increase in the sam- 
ple of 2.2 < z < 3.5 quasars beyond those that are con- 
fidently identified by the automated pipeline. Second, 
0.6% of the quasars identified confidently by the pipeline 
at 2.2 < z < 3.5 either have redshifts in disagreement by 
|Az| > 0.05 with the visual-inspection values, or do not 
have confident visual identification despite having been 
inspected. The latter are due mostly to extremely broad 
absorption-line quasars and to line mis-identifications. 
The overall conclusion, however, is that the complete- 
ness and purity of the automated quasar classification 
and redshift measurement is quite high. 

Figure [12] shows the distribution of error-scaled red- 
shift differences for 1464 repeat BOSS observations of 
confirmed quasars, as well as the redshift-dependent dis- 
tributions of statistical single-epoch redshift error esti- 
mates, analogous to Figures [5] and [5] for galaxies. For 
quasars, the statistical pipeline redshift errors are un- 
derestimated by a factor of approximately two, although 
the true errors in the pipeline quasar redshifts are likely 
dominated by systematic effects. Eight of the repeat ob- 
servations, or about 0.5%, give a redshift difference of 
|Az| > 0.05, consistent with the rate of catastrophic er- 
rors found by the comparison with the visual inspections. 
The redshift range 1.0-2.0 is particularly difficult since 
the observed optical spectra do not have either the nar- 
row [Oin] 5007 line or the strong Lya line to guide the 
template fit. 

5.5. Stellar radial velocity precision 

We now briefly examine the precision and accuracy 
of BOSS stellar radial velocities based on stellar repeat 
observations. Specifically, we identify 8174 repeat obser- 
vations of objects classified as STAR with ZWARNING ==0 
for both epochs. In FiguredH we plot the velocity differ- 
ence between the two epochs of these repeats against the 
quadrature sum of their statistical error estimates. We 
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Fig. 14. — Velocity difference between epochs for 8174 stars with 
more than one good spectrum in the BOSS data set, as a func- 
tion of the quadrature sum of statistical error estimates from the 
two differenced epochs. Also plotted are the 16 th , 50 th , and 84 th 
percentile curves of this velocity difference (blue) and the expected 
statistical ± lcr lines (red). 

see that the distribution becomes tighter at higher S/N 
as expected, with reasonably good agreement between 
estimated statistical error and actual velocity differences 
above approximately 15kms _1 in combined statistical 
error (or approximately lOkms -1 in single-epoch error). 
Subtracting the statistical error estimates in quadrature 
from the half-difference between the 84 th and 16 th per- 
centile velocity differences, and dividing by a factor of 
y2 to convert to a single-epoch value, we find a system- 
atic radial- velocity floor of approximately 4.5 km s -1 at 
the high S/N end, comparable to the 4kms _1 precision 
attai ned for bright stars by the SEGUE project in SDSS- 
I/II (jYannv et al.ll2009l) . 

6. KNOWN ISSUES 

In order to freeze a set of reductions for collaboration 
analysis and public release, we have accepted the pres- 
ence of a number of known outstanding issues in the soft- 
ware that either were deemed small enough in a statisti- 
cal sense within the survey, or were discovered after the 
software freeze deadline. These issues are documented in 
the following list, and several are illustrated in Figure [T"5l 

1. PCA fits of the GALAXY and QSO classes can some- 
times yield unphysical basis combinations at low 
S/N. This effect is part of the motivation for the 
Z_N0QSD redshifts described in %T2l and is illus- 
trated in panel "a" of Figure [TBI In order to enable 
a targeting-blind spectroscopic classification of the 
sort used in SDSS-I/II, this effect could be reme- 
died by priors on physical PCA coefficient com- 
binations, or by non-negativity requirements on 
archetype-based models such as are used for non- 
CV stellar classifications in idlspec2d. These al- 
ternatives are the subject of ongoing development 
for future BOSS data releases. 

2. A small number o f type II quasars (e.g., 
Zakamsk a et al.l [2003D at redshift z ~ 0.5 are se- 
lected by the CM ASS cuts due to their colors, 
but their obscured- AGN spectra are not typical of 
the majority of galaxies used to train the galaxy 



redshift templates. The inclusion of several such 
systems in the galaxy-template training set has 
addressed this issue partially, but a number of 
these objects have a best-fit galaxy-template red- 
shift that confuses broad [Oni] 5007 for Ha. Their 
quasar-template redshifts are generally correct, but 
due to the ZJJ0QS0 redshift strategy employed for 
the BOSS galaxy samples ( ^3.2p . their adopted red- 
shifts are often in error (see panel "b" of Figure [TBI) 
Since these objects represent such a small percent- 
age of the BOSS galaxy target samples, these errors 
were deemed acceptable for DR9 galaxy-clustering 
analyses. 

The fundamental problem is that the spectra of 
type II quasars are sufficiently different from the 
spectra of most BOSS galaxies that we cannot 
span the space of both categories with the cur- 
rent number of PCA templates (four) in the single 
GALAXY basis set. In future BOSS data releases, 
we anticipate addressing this issue through either 
higher-dimensional basis sets with physical coeffi- 
cient priors, sub-division of the GALAXY class into 
several subclasses each with its own basis set, or an 
archetype-based galaxy redshifting algorithm. 

3. A small number of spectra are affected by cross- 
talk from bright stars (generally spectrophotomet- 
ric standards) in neighboring fibers. This is of- 
ten manifested in a strong break feature at the 
dichroic transition around 6000 A (see panel "c" 
of Figure [T5|) , due to different levels of cross-talk 
between the red a nd blue arms of the spectrograph 
(|Smee et al.l 12012ft . These effects appear to occur 
less frequently at later survey dates, presumably 
because of improvements in the operating focus of 
the BOSS spectrographs. We intend to address 
these effects in future BOSS data releases through 
improvements in the extraction codes, and to flag 
any spectra that remain compromised. No masking 
of this effect is implemented for BOSS DR9 data, 
however, except to the extent that it sometimes 
triggers a ZWARNING flag. 

4. As discussed in ^5.31 and shown in Figure HI the 
BOSS redshift success rates are somewhat depen- 
dent on fiber number in the sense that fibers near 
the edge of the spectrograph camera fields of view 
(FIBERID values near 1, 500, and 1000) have lower 
success rates. Longer-term development of new ex- 
traction codes based on the 2D PS F-modeling ap- 
proach of lBolton fc Schlegell (|2010l ) is ongoing, and 
may mitigate this problem to a significant extent. 

5. A few columns in the BOSS CCDs are bad only in 
a transient sense, and are not included in the bad- 
column masks applied to the CCD frames. These 
columns lead to occasional spectrum artifacts con- 
centrated near particular fiber numbers (see panel 
"d" of Figure fTS"]) that are not masked or flagged. 

6. White-dwarf, L-dwarf, carbon-star, and 
cataclysmic-variable star subclasses have less 
accurate template radial-velocity zero-points in 
comparison to the stellar archetypes derived from 
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Fig. 15. — Mosaic of problematic BOSS spectra. Black lines show data (smoothed over a 5-pixel window), and red lines show 1-cr noise 
level estimated by the extraction pipeline. Spectra are labeled by PLATE-MJD-FIBERID. All spectra are from the CMASS sample. Individual 
objects are: (a) redshift z = 0.589 galaxy for which the overall minimum-x 2 fit is an unphysical quasar-class model (yellow), but for which 
the N0QS0 redshift and class (cyan) are confident and correct, as described in £13.21 (b) type-II quasar with a correct quasar-class redshift 
of 2 = 0.419 (yellow) but an incorrect N0QS0 redshift z = 0.083 (cyan) due to confusion of broad [Om] 5007 with He*; (c) spectrum with 
an exaggerated break feature at the 6000 A dichroic transition, due to cross-talk effects from a bright star in a neighboring fiber, but 
for which the pipeline redshift of z = 0.603 (cyan) is nevertheless correct; (d) Spectrum affected by a transient bad CCD column in the 
region 8000 A < A < 9000A, with an unphysical galaxy-class model (cyan) for which the SMALL_DELTA_CHI2 bit is set in the ZWARNING.NOQSO 
mask; (e) spectral superposition of a G star with an M star (the spectrum is confidently classified as STAR), with the best- fit G-star-plus- 
polynomial shown in cyan and the best-fit M-star-plus-polynomial shown in yellow; (f) spectral superposition between a redshift z = 0.291 
emission-line galaxy (cyan) and an M star (yellow); (g) spectral superposition between a redshift z = 0.606 absorption-line galaxy (cyan) 
and a redshift z = 0.402 absorption-line galaxy (yellow); (h) spectrum of a galaxy for which the pipeline cannot distinguish with statistical 
confidence between a redshift of z = 0.576 (cyan) and z = 0.582 (yellow, largely hidden by cyan), and for which the SMALL_DELTA_CHI2 bit 
is consequently set in the ZWARNING_N0QS0 mask (sec i|3.1l and i|3.2l l since these two rcdshifts differ by more than 1000 km s -1 . 
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the Indo-U.S. library. This issue may be rectified 
in future data releases, although the primary 
role of stellar templates in BOSS will remain to 
correctly classify and set aside non-galaxies and 
non-quasars. 

7. Spectra showing superpositions of two objects are 
not systematically identified and flagged by the 
pipeline. While the majority of BOSS spectra are 
of single objects, superpositions are occasionally 
found to occur. In some cases, the inclusion of the 
polynomial terms in the redshift model fitting leads 
to fits of almost equal quality for the two compo- 
nents individually, leading to a SMALL_DELTA_CHI2 
flag in the ZWARNING (or ZWARNINGJJOQSO) mask. 
In other cases, one component is dominant and is 
identified by the pipeline as the confident classifica- 
tion and redshift, but with the second component 
typically identified by one of the lower-quality fits 
reported in the spZall file. Various examples of 
superposition spectra are displayed in Figure [T31 
including star-star (panel "e" ) , star-galaxy (panel 
"f"), and galaxy-galaxy (panel "g"). A system- 
atic search for superposition spectra in the BOSS 
data set by the BOSS Emission- Line Lens Survey 
(BELLS, Bro wnstein et al.ll2012fl has discovered a 
large sample of strong gravitational lens galaxies. 

7. SUMMARY AND CONCLUSION 

We have described the "ID" component of the 
idlspec2d pipeline that provides automated redshift 
measurement and and classification for the SDSS-III 
BOSS DR9 data set, which comprises 831,000 optical 
spectra. This software is substantially similar to the 
idlspec2d redshift analysis code used for SDSS-I/II 
data, but has been upgraded with new templates and sev- 
eral new algorithms for application to the BOSS project, 
and has been presented in great detail for the first time 
in this work. The pipeline also provides additional pa- 
rameter measurements, including emission-line fits for all 
objects, and velocity-dispersion likelihood curves for ob- 
jects classified as galaxies. The redshift success rate of 
the idlspec2d pipeline is well in excess of the scien- 
tific requirements of the BOSS project. The software 
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provides first-principles estimates of statistical redshift 
errors that are Gaussian distributed and accurate to 
within small correction factors. The "2D" component of 
the idlspec2d pipeline that extracts spectra fro m raw 
CCD pixels is the subject of iSchlegel et all (|2012[ ). Full 
data-model information for both the 2D and ID BOSS 
pipeline outputs can be found at the SDSS-III DR9 web- 
site (http : //www . sdss3 . org/ dr9/). 

Development work continues on data-reduction soft- 
ware for BOSS, both in the calibration and extraction 
of spectra, and in the classification and redshift analysis 
procedures. Subsequent BOSS data releases will be ac- 
companied by similar documentation of the implemented 
results of this ongoing development. 
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